Multi-label Text Classification of German Language Medical Documents
نویسندگان
چکیده
and Objective Nearly at every patient visit medical documents are produced and stored in a medical record, often in unstructured form as free text. Growing amount of stored documents increases the need for effective and timely retrieval of information. We developed a multi-label classification system to categorize German language free text medical documents (e.g. discharge letters, clinical findings, reports) into predefined classes. A random sample of 1,500 free text medical documents was retrieved from a general hospital information system, and was assigned manually to 1 to 8 categories by a domain expert. This sample was used to train and evaluate the performance of 4 classification schemes: Naïve Bayes, kNN, SVM and J48. Additional tests of the effect of text preprocessing were done. In our study preprocessing improved the performance, and best results were obtained by J48 classification.
منابع مشابه
Enhanced Information Retrieval from Narrative German-language Clinical Text Documents using Automated Document Classification
The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This...
متن کاملPrototype of a Medical Information Retrieval System for Electronic Patient Records Finding relevant information in clinical text documents
The Steiermärkische Krankenanstalten Ges.m.b.H. (KAGes) conducted the roll-out of an electronic patient record (EPR) system in 2004. This system contains an increasing amount of unstructured clinical text documents in German language. In order to facilitate the patient-related medical decision-making for physicians, this diploma thesis analyses and implements methods retrieving relevant medical...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملMulti-label Classification of Product Reviews Using Structured Svm
Most of the text classification problems are associated with multiple class labels and hence automatic text classification is one of the most challenging and prominent research area. Text classification is the problem of categorizing text documents into different classes. In the multi-label classification scenario, each document is associated may have more than one label. The real challenge in ...
متن کاملA Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy
Text classification, the task of metadata to documents, requires significant time and effort when performed by humans. Moreover, with online-generated content explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Currently, lots of state-or-art text mining methods have been applied to classification process, many of them based on the key wor...
متن کامل